Web Content Mining for Information on Information Scientists
نویسنده
چکیده
This paper presents a search system for information on scientists which was implemented prototypically for the area of information science, employing Web Content Mining techniques. The sources that are used in the implemented approach are online publication services and personal homepages of scientists. The system contains wrappers for querying the publication services and information extraction from their result pages, as well as methods for information extraction from homepages, which are based on heuristics concerning structure and composition of the pages. Moreover a specialised search technique for searching for personal homepages of information scientists was developed.
منابع مشابه
Data Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملA web crawler design for data mining
The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed. The processing of the text of web pages in order to extract information can be expensive in terms of processor time. Consequently a dist...
متن کاملWeb Content Mining Techniques and Tools
with the spectacularly and unpredictable growth of information available over the Internet, WWW has become a powerful platform for storage & retrieval of information. Due to the heterogeneity & unstructured nature of data on WWW, searching of information is becoming cumbersome & time consuming task. Web mining came as solution for the above problem. Web mining is the means of utilizing data min...
متن کاملWeb Mining: a Comparative Study
Currently, World-Wide Web has developed to a distributed information space with nearly 100 million workstations and several billion pages, which brings the people great trouble in finding needed information although huge amount of information available on webs. The search engine is a very important tool for people to obtain information on Internet, but the low-precision and lowrecall exist wide...
متن کامل